In this assignment, we explore the gapminder data set using the data wranggling library dplyr and data exploration library ggplot2, used for plotting data. The dplyr library is loaded through the tidyverse library.
First, let us load the libraries that would be used to explore the data set. Here, we load the gapminder data set and the tidyverse library.
library(gapminder) # loads gapminder data
library(tidyverse) # loads the tidyverse library
## Warning: replacing previous import by 'tibble::as_tibble' when loading
## 'broom'
## Warning: replacing previous import by 'tibble::tibble' when loading 'broom'
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
In this section, we shall check the features of the data such as the structure, the number and data types of the variables, the dimension of the data, among others.
This function is used to check the structure and class of the data. It give a comprehensive information about the data. Let us see how this function is used and the output it produces.
str(gapminder) # displays the structure of the data like the variables, their types, the dimension, ...
## Classes 'tbl_df', 'tbl' and 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Observe that this function has given us a lot of information about the data. Information such as the class of the data, the number of variables, the type of each of the variable and so on. It is important to mention that one can still check for this properties of the data using some other functions.
Before we use other functions to explore the structure of thiz data, let us view its first few rows using the head() function
head(gapminder) # displays the first few rows of the data (default is 6)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
This function gave us the first 6 rows of the data, we can always get more by using
head(gapminder,15) # displays the first 15 rows of the gapminder data
## # A tibble: 15 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## 11 Afghanistan Asia 2002 42.1 25268405 727.
## 12 Afghanistan Asia 2007 43.8 31889923 975.
## 13 Albania Europe 1952 55.2 1282697 1601.
## 14 Albania Europe 1957 59.3 1476505 1942.
## 15 Albania Europe 1962 64.8 1728137 2313.
Also, the last few rows of a data frame can be displayed by using the tail() function:
tail(gapminder) # displays the last few rows of the data (default is 6)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Zimbabwe Africa 1982 60.4 7636524 789.
## 2 Zimbabwe Africa 1987 62.4 9216418 706.
## 3 Zimbabwe Africa 1992 60.4 10704340 693.
## 4 Zimbabwe Africa 1997 46.8 11404948 792.
## 5 Zimbabwe Africa 2002 40.0 11926563 672.
## 6 Zimbabwe Africa 2007 43.5 12311143 470.
Similarly, tail(gapminder,n) can be used to display the last \(n\) rows of the gapminder data.
Now, let us check the features of the gapminder data one after the other.
class(gapminder) # displays the class of the data
## [1] "tbl_df" "tbl" "data.frame"
This tells us that the gapminder data is a data frame.
The next few function are used to display the dimension, the number of rows and columns, and the variables in the data.
dim(gapminder) # diplays the dimension of data
## [1] 1704 6
nrow(gapminder) # diplays the number of rows
## [1] 1704
ncol(gapminder) # diplays the number of columns
## [1] 6
names(gapminder) # diplays the varibles/fields in the data
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
This function applies a specific function to a group of variables. Let us illusttrate how it works by using it to get the class of each of the variables in the gapminder data.
We use sapply() function to check the class of the variables in our data as follows.
sapply(gapminder,class) # applies the 'class()' function to each variable in the gapminder data
## country continent year lifeExp pop gdpPercap
## "factor" "factor" "integer" "numeric" "integer" "numeric"
| Variables | data class |
|---|---|
| country | factor/categorical |
| continent | factor/categorical |
| year | numeric |
| lifeExp | integer |
| pop gdpPercap | numeric |
As another example, let us use the sapply() function to gapply the function typeof() to the gapminder data.
sapply(gapminder,typeof) # applies the 'typeof()' function to each variable in the gapminder data
## country continent year lifeExp pop gdpPercap
## "integer" "integer" "integer" "double" "integer" "double"
Note that what we got is not what we are actually expecting, for example, ‘country’ is categorical data but it gives integer. This is because the typeof() function returns the types of R objects.
Let us try these same explorations by piping into the functions.
gapminder %>% # loads the gapminder data (and the result is piped into the next function)
sapply(class) # applies the class function on the loaded data
## country continent year lifeExp pop gdpPercap
## "factor" "factor" "integer" "numeric" "integer" "numeric"
gapminder %>% # loads the gapminder data (and the result is piped into the next function)
sapply(typeof) # applies the typeof() function on the loaded data
## country continent year lifeExp pop gdpPercap
## "integer" "integer" "integer" "double" "integer" "double"
Let us begin by considering the continent variable. We first explore this variable using bar chat.
barchart <- gapminder %>% # loads the gapminder data
ggplot(aes(x=continent,fill=continent)) + # calls the ggplot function and specify the axis and fill
geom_bar() # specify the type of plot
barchart #sdisplays the bar chart
This bar chart shows the number of observations available for each continent. For instance, we have over 600 observation from Africa, around 400 for Asia, and around 300 for America.
This information can be shown more precisely by extracting the ‘continent’ column from the data and summarizing it as shown below;
The table() and summary() function
sumAry <- gapminder %>% # loads gapminder data
select(continent) %>% # extract the column for continents
summary() # present a summary of the result
sumAry
## continent
## Africa :624
## Americas:300
## Asia :396
## Europe :360
## Oceania : 24
We can also use the table() function to show the same result.
cont_table <- gapminder %>% # loads gapminder data
select(continent) %>% # extract the column for continents
table() # present the result in a table
cont_table # display the table
## .
## Africa Americas Asia Europe Oceania
## 624 300 396 360 24
The table() and summary() functions as used here give the precise number of observations available for each continent. Next, we plot this information with a pie chart.
Let us display the information about the amount of observations available for each continent using a simple pie chart.
pie(cont_table) # use the table constructed earlier to plot a pie chart
We can also plot a pie chart using ggplot() function
piechart <- barchart + coord_polar() # plot a pie chart with the ggplot function
piechart
Observe that the code used to generate this plot depends on the code we used to plot bar chart earlier. In addition, this plot is not good for our data because it does not show the information for Oceania continent properly. Let us try another pie chart:
# plotting another type of pie chart
piechart2 <- barchart +
coord_polar("y",start=0) +
scale_fill_brewer(palette = "Dark2") +
theme_minimal()
piechart2
This plot is better in the sense that it shows the information for all the continent well, but it may take some time to understand. Let us explore other types of pie charts available in ggplot.
# plotting another type of pie chart
piechart3 <- ggplot(gapminder,aes(x="",fill=factor(continent))) + geom_bar(width=1) +
coord_polar("y",start=0) +
scale_fill_brewer(palette = "Dark2") +
theme_minimal()
piechart3
Let us plot a bullseye chart
# plotting a bullseye chart
ggplot(gapminder,aes(x="",fill=factor(continent))) + geom_bar(width=1) +
coord_polar() +
scale_fill_brewer(palette = "Dark2") +
theme_minimal()
We can also extract the observations for each continent using the filter() function.
gapminder %>%
filter(continent == 'Africa') %>% # extract the data for African countries only
dim() # displays the dimension of the extracted data
## [1] 624 6
This shows that we have 624 observation for African countries. We can always do the same for other continents or countries too. As one more example, let us check how many observations are from Nigeria.
gapminder %>%
filter(continent == 'Africa') %>% # extract the data for African countries only
filter(country == 'Nigeria') %>% # extract the observations from Nigeria.
dim() # displays the dimension of the extracted data
## [1] 12 6
There are 12 of them. We can bypass the part of this code that first extracted the data for Africa, and just extract that of Nigeria directly from the gapminder data. This is shown below;
gapminder %>%
filter(country == 'Nigeria') %>% # extract the observations from Nigeria.
dim() # displays the dimension of the extracted data
## [1] 12 6
Here, we would be considering quantitative variables. First, let us check how many years of data is available for each country.
gapminder %>%
select(year) %>% # extracts the 'year' column
unique() # displays each entry uniquely
## # A tibble: 12 x 1
## year
## <int>
## 1 1952
## 2 1957
## 3 1962
## 4 1967
## 5 1972
## 6 1977
## 7 1982
## 8 1987
## 9 1992
## 10 1997
## 11 2002
## 12 2007
This shows that we have 12 years of data.
Let us look at the summary satistics of the numerical variables
gapminder %>%
select(pop) %>% # extract population column
summary() # gives summary of population
## pop
## Min. :6.001e+04
## 1st Qu.:2.794e+06
## Median :7.024e+06
## Mean :2.960e+07
## 3rd Qu.:1.959e+07
## Max. :1.319e+09
Here is the Summary statistics for population (pop)
| Statistic | values |
|---|---|
| Minimum | 60,010 |
| First quarter | 2.794 million |
| Median | 7.024 million |
| Mean | 29.60 million |
| Third Quarter | 19.59 million |
| Maximum | 1.319 billion |
gapminder %>%
select(lifeExp) %>% # extract life expectancy column
summary() # displays the summary statistic for life expectancy
## lifeExp
## Min. :23.60
## 1st Qu.:48.20
## Median :60.71
## Mean :59.47
## 3rd Qu.:70.85
## Max. :82.60
Here is the Summary statistics for Life expectancy
| Statistic | values (years) |
|---|---|
| Minimum | 23.60 |
| First quarter | 48.20 |
| Median | 60.71 |
| Mean | 59.47 |
| Third Quarter | 70.85 |
| Maximum | 82.60 |
Let us plot the life expectancy for the extire gapminder data
gapminder %>% # loads the gapminder data
ggplot(aes(lifeExp)) + # calls the ggplot function
geom_histogram(bins=30,aes(fill=continent)) # specifies the type of plot
Hmm… This plot does not give us detailed information about the life expectancy for each continent. Let us plot the same data using a density plot.
gapminder %>% # loads the gapminder data
ggplot(aes(lifeExp)) + # calls the ggplot function
geom_density(aes(fill=continent)) # specifies the type of plot
This looks better but we still have some overlapping of data. Maybe we can plot for each continent separately?
Now, let us plot the histogram and density plot for each continent separately using the facet_wrap() function.
gapminder %>% # loads the gapminder data
ggplot(aes(lifeExp)) + # calls the ggplot function
geom_histogram(bins=30,aes(fill=continent)) + # specifies the type of plot
facet_wrap(~continent) # specifies that it should plot for each continent seperately
How about the gdp per capital for each continent?
gapminder %>% # loads the gapminder data
ggplot(aes(gdpPercap)) + # calls the ggplot function
geom_histogram(bins=30,aes(fill=continent)) + # specifies the type of plot
facet_wrap(~continent) +# specifies that it should plot for each continent seperately
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 94 rows containing missing values (geom_bar).
gapminder %>% # loads the gapminder data
ggplot(aes(lifeExp)) + # calls the ggplot function
geom_density(bins=30,aes(fill=continent)) + # specifies the type of plot
facet_wrap(~continent) # specifies that it should plot for each continent seperately
## Warning: Ignoring unknown parameters: bins
gapminder %>% # loads the gapminder data
ggplot(aes(gdpPercap)) + # calls the ggplot function
geom_density(bins=30,aes(fill=continent)) + # specifies the type of plot
facet_wrap(~continent) # specifies that it should plot for each continent seperately
## Warning: Ignoring unknown parameters: bins
In this section, we shall be exploring and visualizing the entire gapminder data or part of using plots and figures.
Let us begin by plotting a scatter plot of the *gdpPercap and population**.
gapminder %>% # loads the gapminder data
select(pop,gdpPercap,continent) %>% # extract the columns to be considered
ggplot(aes(pop,gdpPercap)) + # calling the ggplot function
geom_point(aes(color=continent)) + # specifies the type of plot
scale_y_log10() + scale_x_log10() # log scale for both axis
Hmm this looks messy! Let us plot the same columns but for one continent only, say Europe.
gapminder %>% # loads the gapminder data
filter(continent=='Europe') %>% # extracting data for Europian countries only
select(pop,gdpPercap,continent) %>% # select the columns to be considered
ggplot(aes(pop,gdpPercap)) + # calling the ggplot function
geom_point(aes(color=continent)) + # specifies the type of plot
scale_y_log10() + scale_x_log10() # log scale for both axis
We can do this in a more fancy way by plotting for each continent separately using the facet_wrap() function.
gapminder %>% # loads the gapminder data
select(pop,gdpPercap,continent) %>% # select the variables to be considered
ggplot(aes(pop,gdpPercap)) + # calling the ggplot function
geom_point(aes(color=continent)) + # specifies the type of plot
facet_wrap(~continent) +
scale_y_log10() + scale_x_log10() # log scale for both axis
This shows the GDP per capital vs population for each continent.
Some of the things we can easily see from these plots is that in Asia, when the pupolation is high, the gdp per capital is low and when the gdp per capital is high, the population is low. From this graphs we can easily see the relationship between the gdp per capital for each country and the population over the years.
Let us plot population vs life expectancy for this data and colour them by continent.
gapminder %>% # loads the gapminder data
select(lifeExp,pop,continent) %>% # select the variables to be considered
ggplot(aes(x=lifeExp,y=pop)) + # calling the ggplot function
geom_point(aes(color=continent))+ # specifies the type of plot
scale_y_log10()
Now, let us plot this information continent by continent.
gapminder %>% # loads the gapminder data
ggplot(aes(lifeExp,pop)) + # calling the ggplot function
geom_point(aes(color=continent)) + # specifies the type of plot
facet_wrap(~continent) + scale_y_log10()
How about somthing similar for African and Europian countries only.
gapminder %>% # loads the gapminder data
filter(continent=='Europe' | continent=='Africa') %>% # extract data for African and Europian countries only
ggplot(aes(lifeExp,pop)) + # calling the ggplot function
geom_point(aes(color=continent)) + # specifies the type of plot
scale_y_log10() # specify log scale for y
This shows that most of the African countries have lower life expentancy, relative to the Europian countries.
Let us see how this information is displayed with a boxplot() and a violin() plot:
gapminder %>% # loads the gapminder data
filter(continent=='Europe' | continent=='Africa') %>% # extract data for African and Europian countries only
ggplot(aes(lifeExp,pop,fill=continent)) + # calling the ggplot function
geom_boxplot(aes(color=continent)) + # specifies the type of plot
scale_y_log10() # specify log scale for y
gapminder %>% # loads the gapminder data
filter(continent=='Europe' | continent=='Africa') %>% # extract data for African and Europian countries only
select(lifeExp,pop,continent) %>%
ggplot(aes(lifeExp,pop,fill=continent)) +
geom_violin(aes(color=continent)) + # specifies the type of plot
scale_y_log10() # specify log scale for y
## Warning: position_dodge requires non-overlapping x intervals
Now, let us use boxplot() to show the life expentancy data for each continent.
gapminder %>% # loads the gapminder data
ggplot(aes(continent,lifeExp,fill=continent)) +
geom_boxplot(aes(color=continent)) # specifies the type of plot
What of using geom_violin() plot together with geom_jitter() for the same data?
gapminder %>% # loads the gapminder data
ggplot(aes(continent,lifeExp)) +
geom_violin(aes(color=continent,fill=continent)) + # specifies the type of plot
geom_jitter(alpha=0.2) # specifies additional type of plot
Let us use a boxplot to show the gpd per capital for each continent in different years using facet_wrap() function
gapminder %>% # loads the gapminder data
ggplot(aes(continent,gdpPercap)) +
geom_boxplot(fill='green') + # specifies the type of plot
scale_y_log10() +
geom_jitter(alpha=0.3,fill='red') + # specifies additional type of plot
facet_wrap(~year) # specify that each year should be plotted separately
How about plotting the same figure but for some selected years? Let us do this for the following years; 1952, 1962, 1972, 1982, 1997, and 2007.
gapminder %>% # loads the gapminder data
filter(year == '1952' | year == '1962' | year == '1972' | year == '1982' | year == '1997' | year == '2007') %>%
ggplot(aes(continent,gdpPercap)) +
geom_boxplot(fill='green') +
scale_y_log10() +
geom_jitter(alpha=0.3,fill='red') +
facet_wrap(~year)
What if we want to do the same but for a specific year upward or downward? Say from 1982 to 2007
gapminder %>% # loads the gapminder data
filter(year >= "1982") %>%
ggplot(aes(continent,gdpPercap)) +
geom_boxplot(fill='green') +
scale_y_log10() +
geom_jitter(alpha=0.3,fill='red') +
facet_wrap(~year)
Let us do the same for countreis less than 1982;
gapminder %>% # loads the gapminder data
filter(year < '1982') %>%
ggplot(aes(continent,gdpPercap)) +
geom_boxplot(fill='green') +
scale_y_log10() +
geom_jitter(alpha=0.3,fill='red') +
facet_wrap(~year)
Let us use geom_smooth() function to plot the population for all the continent.
gapminder %>% # loads the gapminder data
ggplot( aes(year,pop,colour=continent)) +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Population is increasing over the years.
How about the same for only Asia and Africa over the years?
gapminder %>% # loads the gapminder data
filter(continent=="Asia" | continent=="Africa") %>%
ggplot( aes(year,pop,colour=continent)) +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Asian’s population is increasing faster than that of Africa over the years.
Let us plot the gdp per capital for all the continents using the same function.
gapminder %>% # loads the gapminder data
ggplot( aes(year,gdpPercap,colour=continent)) +
geom_smooth(model=lm)
## Warning: Ignoring unknown parameters: model
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Gpd per capital for each continent is increasing over the years, with Africa having the slowest growth.
Here, I will be extracting the observations from African and then those from Nigeria. We will explore the extracted data in more detail.
Let us extract the observations for all the African countries using the filter() function
AfricData <- filter(gapminder,continent=="Africa") # extracts the observations from Africa
dim(AfricData) # displays the size of the data
## [1] 624 6
head(AfricData,20) # displays the first 20 rows of the extracted data
## # A tibble: 20 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Algeria Africa 1952 43.1 9279525 2449.
## 2 Algeria Africa 1957 45.7 10270856 3014.
## 3 Algeria Africa 1962 48.3 11000948 2551.
## 4 Algeria Africa 1967 51.4 12760499 3247.
## 5 Algeria Africa 1972 54.5 14760787 4183.
## 6 Algeria Africa 1977 58.0 17152804 4910.
## 7 Algeria Africa 1982 61.4 20033753 5745.
## 8 Algeria Africa 1987 65.8 23254956 5681.
## 9 Algeria Africa 1992 67.7 26298373 5023.
## 10 Algeria Africa 1997 69.2 29072015 4797.
## 11 Algeria Africa 2002 71.0 31287142 5288.
## 12 Algeria Africa 2007 72.3 33333216 6223.
## 13 Angola Africa 1952 30.0 4232095 3521.
## 14 Angola Africa 1957 32.0 4561361 3828.
## 15 Angola Africa 1962 34 4826015 4269.
## 16 Angola Africa 1967 36.0 5247469 5523.
## 17 Angola Africa 1972 37.9 5894858 5473.
## 18 Angola Africa 1977 39.5 6162675 3009.
## 19 Angola Africa 1982 39.9 7016384 2757.
## 20 Angola Africa 1987 39.9 7874230 2430.
Now, we would extract the observation for NIgeria only.
NigData <- filter(gapminder,country=="Nigeria")
dim(NigData)
## [1] 12 6
head(NigData)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Nigeria Africa 1952 36.3 33119096 1077.
## 2 Nigeria Africa 1957 37.8 37173340 1101.
## 3 Nigeria Africa 1962 39.4 41871351 1151.
## 4 Nigeria Africa 1967 41.0 47287752 1015.
## 5 Nigeria Africa 1972 42.8 53740085 1698.
## 6 Nigeria Africa 1977 44.5 62209173 1982.
Let us check how the expectancy of Nigerians have changed over the years. We can plot this life expectancy over the years using geom_line() function.
NigData %>%
ggplot( aes(year,lifeExp)) +
geom_line() +
geom_point()
Now, let us plot Nigeria population over the years.
NigData %>% # load the extracted observation for Nigeria
ggplot( aes(year,pop)) + # calls the ggplot function
geom_line() + # specifies the type of plot you want
geom_area(fill='green') # fill the area under the graph
Here, I have use the geom_area() function to shade the area under the curve.
We do the same for gdp per capital over the years.
NigData %>%
ggplot( aes(year,gdpPercap)) +
geom_line() +
geom_point()
Next, we check if there is relationship between the variables in the observation for Nigeria.
Let us start by checking life expectancy and population over the years.
NigData %>%
count(lifeExp,pop,year) %>%
ggplot(aes(x=lifeExp,y=pop)) +
geom_point(aes(color=year,size=year))+
scale_y_log10()
From the graph, it looks like there is a positive relationship between them. Let us check verify this checking the correlation between the two variables
cor(NigData$lifeExp,NigData$pop) # computes correlation between lifeExp and pop
## [1] 0.853939
This confirms that there is a stron positive relationship between them. What of life expectancy and gdp per capital?
NigData %>%
ggplot(aes(x=lifeExp,y=gdpPercap)) +
geom_point(aes(color=year,size=year))+
scale_y_log10()
It is hard to tell if there is a relationship from the plot. Let us check my computing the correlation coefficient.
cor(NigData$lifeExp,NigData$gdpPercap) # computes correlation between lifeExp and gdpPercap
## [1] 0.7360712
There is a positive relationship between lifeExp and gdpPercap. Lastly, we plot the population vs gdp per capital.
NigData %>%
ggplot(aes(x=pop,y=gdpPercap)) +
geom_point(aes(color=year,size=year))+
scale_y_log10()
It is not easy to tell from the plot if a relationship exist, let us verify with correlation coefficient.
cor(NigData$pop,NigData$gdpPercap) # computes correlation between population and gdpPercap
## [1] 0.6928366
The analyst wanted to extract data for Rwanda and Afghanistan only and they used the following code
filter(gapminder, country == c("Rwanda","Afghanistan"))
## # A tibble: 12 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1957 30.3 9240934 821.
## 2 Afghanistan Asia 1967 34.0 11537966 836.
## 3 Afghanistan Asia 1977 38.4 14880372 786.
## 4 Afghanistan Asia 1987 40.8 13867957 852.
## 5 Afghanistan Asia 1997 41.8 22227415 635.
## 6 Afghanistan Asia 2007 43.8 31889923 975.
## 7 Rwanda Africa 1952 40 2534927 493.
## 8 Rwanda Africa 1962 43 3051242 597.
## 9 Rwanda Africa 1972 44.6 3992121 591.
## 10 Rwanda Africa 1982 46.2 5507565 882.
## 11 Rwanda Africa 1992 23.6 7290203 737.
## 12 Rwanda Africa 2002 43.4 7852401 786.
NO! the analyst did not succeed. Because this only produced 12 rows, observations for only 6 years fro each of the countries instead of 12 for each, to give a total of 24 rows.
We see below that in the pagminder data, there are 12 rows for Rwanda and also 12 for Afghanistan.
filter(gapminder, country == "Rwanda")
## # A tibble: 12 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Rwanda Africa 1952 40 2534927 493.
## 2 Rwanda Africa 1957 41.5 2822082 540.
## 3 Rwanda Africa 1962 43 3051242 597.
## 4 Rwanda Africa 1967 44.1 3451079 511.
## 5 Rwanda Africa 1972 44.6 3992121 591.
## 6 Rwanda Africa 1977 45 4657072 670.
## 7 Rwanda Africa 1982 46.2 5507565 882.
## 8 Rwanda Africa 1987 44.0 6349365 848.
## 9 Rwanda Africa 1992 23.6 7290203 737.
## 10 Rwanda Africa 1997 36.1 7212583 590.
## 11 Rwanda Africa 2002 43.4 7852401 786.
## 12 Rwanda Africa 2007 46.2 8860588 863.
filter(gapminder, country == "Afghanistan")
## # A tibble: 12 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## 11 Afghanistan Asia 2002 42.1 25268405 727.
## 12 Afghanistan Asia 2007 43.8 31889923 975.
The follow code can be used to extract all the observations for Rwanda and Afghanistan only from the gapminder data.
filter(gapminder, country == "Rwanda" | country == "Afghanistan" )
## # A tibble: 24 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 14 more rows
Let us install the kableExtra package
install.packages("kableExtra")
next, we load the library knitr and knitr
library(knitr)
library(kableExtra)
Let us use the kable() function to display the gapminder data. This function is slow when it is used on the gapminder data, I be using another data to illustrate how it works.
Data of death rates in Virginia (1940).
head(VADeaths) # displays first few rows of data
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
dim(VADeaths) # displays the dimension
## [1] 5 4
Note: The result for the kable function makes it difficult for my github_document to run so I have displayed them in html_document and commented the code here.
Let us use the kable() to display the
VADeaths %>%
kable() %>%
kable_styling()
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
It looks nice than what you will get using head() function
VADeaths %>%
kable() %>%
kable_styling(bootstrap_options = c("striped"))
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
We can also make the table smaller.
VADeaths %>%
kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F)
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
Align the table to the left:
VADeaths %>%
kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position="left")
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
VADeaths %>%
kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position="right")
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
Adjusting the font size:
VADeaths %>%
kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position="right", font_size=8)
| Rural Male | Rural Female | Urban Male | Urban Female | |
|---|---|---|---|---|
| 50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
| 55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
| 60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
| 65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
| 70-74 | 66.0 | 54.3 | 71.1 | 50.0 |